Cloud Interview Guide v7 (Self-Contained HTML with Images)

Page 1

Table of Contents Cloud With Raj www.cloudwithraj.com - AWS Services for Interviews - Microservice with ALB - Event Driven Architecture (Basic) - Microservice Vs Event Driven - Event Driven Architectures (Advanced) - Three-Tier Architecture - Availability Zone & Data Center - Lambda Cheaper Than EC2? - RPO Vs RTO - IP Address Vs URL - DevOps Phases - CI Vs CD Vs CD - Kubernetes Tools Landscape - Traditional CICD Vs GitOps - Platform Vs Developer Team - Scaling EC2 Vs Lambda Vs EKS - Kubernetes (EKS) Scaling - Gen AI Layers - Prompt Engineering Vs RAG Vs Fine Tuning - RAG with Bedrock - EKS Upgrade With Karpenter - EKS Upgrade With Karpenter - Advanced - Container Lifecyle - Local to Cloud - Gen AI Multi Model Invocation - Git Workflow - DevSecOps Workflow - What Happens When You Type an URL - AWS Well Architected Framework - AWS Migration Tools - Multi-site Active Active DR - Kubernetes Node Pod Container Relationship - Running Batch Workloads on AWS - STAR Interview Format - Hybrid Cloud Architecture - How Karpenter Saves Money - API Gateway Auth  - Serverless Web Application - EDA with Kubernetes - EDA with SNS, SQS, Kubernetes - EKS Auto (Re:Invent 2024) - Aurora DSQL (Re:Invent 2024) - S3 Tables (Re:Invent 2024) - Docker Vs Kubernetes - System Design Trade-Offs - Top 3 Popular Design - Three Tier with Microservice - Pod Container Sidecar - Karpenter Bin Pack Granular Control - Monolith Vs Microservice - EventBridge Cross Account - Kubernetes Tech Stack - Lambda Transform - Microservice Tech Stack - Live Streaming - Live Streaming with Ads More To Be Added...   AWS Cloud Interview Guide Raj's Bio - Principal SA at             (6+ Years) - 20+ Years of IT experience - Designed and implemented multiple world-scale      projects with official AWS blogs on them - Trained students get SA jobs via SA Bootcamp - Bestselling author (60,000+ paid students) - Presented highly-rated talks at major events - LinkedIn Top Systems Design Voice Lighthouse projects designed : Dr. B. Covid Vaccine Registration, Collins Flight Systems used by 100+ airlines, Freddie Mac Datacenter to AWS Migration, and more Please follow me on socials:

Page 2

Important AWS Services for Interviews Cloud With Raj Compute Storage Network AWS Global Infrastructure (Region, AZ) Security Migration Gen AI Partyrock Event Driven Observability Cost Optimization Analytics DevOps EC2 Auto Scaling Lambda Elastic Kubernetes Service S3 EBS RDS DynamoDB ElastiCache VPC Load Balancer API Gateway KMS IAM WAF Shield GuardDuty Secrets Manager Config DMS Migration Hub Application Migration Service Application Discovery Service BedRock Q SNS SQS EventBridge Step Functions CloudWatch CloudTrail X-Ray Compute Optimizer CloudWatch Insights Cost Explorer Budget Spot Instance Reserve Instance Reporting Savings Plan Glue EMR Athena QuickSight Kinesis CloudFormation SageMaker www.cloudwithraj.com ECS

Page 3

Microservices with ALB EC2 (Running Code)              ALB (url: cloudwithraj.com) Database Amazon Aurora EC2 (Scaled Up) Auto Scaling Group Cloud With Raj www.cloudwithraj.com              Domain: cloudwithraj.com /browse Target Group 1 Lambda /buy Target Group 2 /* (Catch all) Target Group 3 (Handles traffic of cloudwithraj.com/browse) (Handles traffic of cloudwithraj.com/buy) (Handles traffic of anything else) Database Amazon DynamoDB Database Amazon Aurora Microservice 1 Microservice 2 Microservice 3 EKS              Path Based Routing

Page 4

Event Driven Architecture (Basic) Cloud With Raj www.cloudwithraj.com API Gateway SQS Lambda An event-driven architecture decouples the producer and processor. In this example producer (human) invokes an API, and send information in JSON payload. API Gateway puts it into an event store (SQS), and the processor (Lambda) picks it up and processes it. Note that, the API gateway and Lambda can scale (and managed/deployed) independently Benefits of an event-driven architecture 1. Scale and fail independently - By decoupling your services, they are only aware of the event router, not each other. This means that your services are interoperable, but if one service has a failure, the rest will keep running. The event router acts as an elastic buffer that will accommodate surges in workloads. 2. Develop with agility - You no longer need to write custom code to poll, filter, and route events; the event router will automatically filter and push events to consumers. The router also removes the need for heavy coordination between producer and consumer services, speeding up your development process. 3. Audit with ease - An event router acts as a centralized location to audit your application and define policies. These policies can restrict who can publish and subscribe to a router and control which users and resources have permission to access your data. You can also encrypt your events both in transit and at rest. 4. Cut costs - Event-driven architectures are push-based, so everything happens on-demand as the event presents itself in the router. This way, you’re not paying for continuous polling to check for an event. This means less network bandwidth consumption, less CPU utilization, less idle fleet capacity, and less SSL/TLS handshakes. Event Store

Page 5

Lambda Database Amazon DynamoDB Microservice Microservice Vs Event Driven Cloud With Raj www.cloudwithraj.com API Gateway SQS Lambda Event Store /buy (POST)              Domain: cloudwithraj.com (processes messages from SQS for cloudwithraj.com/buy (POST)) ) Database Amazon DynamoDB Websocket API Microservice with Event Driven API Gateway /buy (POST) (Handles traffic of cloudwithraj.com/buy (POST)) The main differences are: 1. Traditional microservice is synchronous i.e. the request and response happens with the same invocation. Whereas with Event Driven, User gets a confirmation that message is inserted into SQS. But he doesn't get the response from the actual message processing by the Lambda in the same invocation. Instead the backend Lambda needs to send response out, in this case, using websocket APIs, to the user. Or the user can query the status afterwards 2. With EDA, API Gateway, and Lambda/Database can scale independently. Lambda can consume messages at a rate not to overwhelm the database 3. With EDA retries are built in. With microservices, if Lambda fails, user need to send the request again. With EDA, once the message is in SQS, even if Lambda fails, SQS will automatically retry

Page 6

Event Driven Architecture (Advanced) Cloud With Raj www.cloudwithraj.com API Gateway SQS 1 Lambda 1 Based on values in the message, EventBridge can fire different targets Event Store + Router EventBridge Step Function Lambda 2 Rule 1 Rule 2 Rule 3 SQS 2 SQS 3 SNS Lambda 1 Lambda 2 EKS Application Destination Filter 1 Destination Filter 2 Destination Filter 3 Based on values in the message, SNS can fire different targets SNS Vs SQS Vs EventBridge Detailed Video:

Page 7

3 Tier Architecture EC2 Webserver External Facing ALB Internal ALB Database Amazon Aurora EC2 Webserver EC2 Appserver EC2 Appserver Auto Scaling Group Auto Scaling Group Availability Zone 1 Availability Zone 1 Availability Zone 2 Availability Zone 2 PRESENTATION LAYER APPLICATION LAYER DATABASE Cloud With Raj www.cloudwithraj.com 1. First layer is presentation layer. Customers consume the application using this layer. Generally, this is where the front end runs. For example - amazon.com website. This is implemented using an external facing load balancer distributing traffic to VMs (EC2s) running webserver. 2. Second layer is application layer. This is where the business logic resides. Going with the previous example - you browsed your products on amazon.com, and now you found the product you like and then click "add to cart". The flow comes to the application layer, validates the availability, and then creates a cart. This layer is implemented with internal facing load balancer and VMs running applications. 3. The last layer is the database layer. This is where information is stored. All the product information, your shopping cart, order history etc. Application layer interacts with this layer for CRUD (Create, Read, Update, Delete) operations. This could be implemented using one or mix of databases - SQL (e.g. Amazon Aurora), and/or NoSQL (DynamoDB) Lastly - why is this so popular in interviews? This architecture comprised of many critical patterns - microservices, load balancing, scaling, performance optimization, high availability, and more. Based on your answers, interviewer can dig deep and check your understanding of the core concepts.

Page 8

How Many Data Centers in One Availability Zone? Correct Answer: An AWS availability zone (AZ) can contain multiple data centers. Each zone is usually backed by one or more physical data centers, with the largest backed by as many as five. Incorrect Answer: One Availability Zone means one data center  1 Availability Zone Cloud With Raj www.cloudwithraj.com

Page 9

Is AWS Lambda Cheaper than Amazon EC2? Incorrect Answer: Yes, AWS Lambda is cheaper than Amazon EC2 Correct Answer: It depends on the application. Both Lambda and EC2 have different cost factors (see above). It is possible that, depending on the application, AWS Lambda can have a higher charge than EC2, and vice versa. it is important to consider not just the compute cost but the TCO (Total Cost of Ownership). With AWS Lambda there is no AMI to maintain, patch, and rehydrate, reducing management overhead and hence overall TCO.     Lambda Cost Factors: Architecture (x86 vs. Graviton) Number of requests Duration of each request Amount of memory allocated (NOT used) Amount of ephemeral storage allocated EC2 Main Cost Factors: Instance family Attached EBS Duration of EC2 runtime s Cloud With Raj www.cloudwithraj.com

Page 10

RPO Vs RTO Cloud With Raj www.cloudwithraj.com Disaster Recovery Time (RTO) <      DOWNTIME     > <   DATA LOSS  > Recovery Point (RPO) Time Candidates are sometimes confused by RPO thinking it's measured in unit of data, e.g. gigabyte, petabyte etc.   Correct Answer: Both RPO and RTO are measured in time. RTO stands for Recovery Time Objective and is a measure of how quickly after an outage an application must be available again. RPO, or Recovery Point Objective, refers to how much data loss your application can tolerate. Another way to think about RPO is how old can the data be when this application is recovered i.e. the time between the last backup and the disaster. With both RTO and RPO, the targets are measured in hours, minutes, or seconds, with lower numbers representing less downtime or less data loss. RPO and RTO can be and often have different values for an application. Most Recent Backup

Page 11

IP Address Vs. URL Bad Answer: URL is a link assigned to an IP address  Virtual Machine (E.g. EC2) IPAddress1 192.50.20.12 Virtual Machine (E.g. EC2) IPAddress2 212.60.20.12 DNS (Domain Name System) Load Balancer Assigns URL to Load Balancer (Uniform Resource Locator) Access URL Virtual Machine (E.g. EC2) IPAddress1 250.80.10.12 (Went Down!!) Correct Answer: IP Address is a unique number that identifies a device connected to the internet, such as a Virtual Machine running your application. However, accessing a resource using this unique number is cumbersome; moreover, let's say when a VM comes down (the bottom one in the diagram), a new VM comes up to replace it with a different IP address. Hence, in reality, application running inside the VM is accessed using URL or Uniform Resource Locator.  One URL does generally NOT map to one IP address; rather, the URL (e.g., www.amazon.com) is mapped to a Load Balancer, and that Load Balancer distributes traffic to multiple VMs with different IP addresses. Even if one VM goes down and another comes up, this Load Balancer using a URL always works because the Load Balancer appropriately distributes traffic across healthy instances. This way, you (the user) do not need to worry about the underlying IP addresses. Cloud With Raj www.cloudwithraj.com

Page 12

DevOps CICD Phases with Tools Author Source Build Test Deploy Monitor Write code Check-in source code Compile code Create artifacts Unit testing Integration testing Load testing UI testing Penetration testing Deploy artifacts Logs, metrics, and traces VS Code AWS CodeCommit GitHub AWS CodeBuild Jenkins AWS CodeBuild Jenkins Jenkins AWS CodeDeploy Amazon  CloudWatch AWS X-Ray Continuous Integration (CI) Continuous Deployment (CD) Cloud With Raj www.cloudwithraj.com

Page 13

DevOps CICD Phases Author Source Build Test Deploy Monitor Write code Check-in source code Compile code Create artifacts Unit testing Integration testing Load testing UI testing Penetration testing Deploy artifacts Logs, metrics, and traces VS Code AWS CodeCommit GitHub AWS CodeBuild Jenkins AWS CodeBuild Jenkins Jenkins AWS CodeDeploy Amazon  CloudWatch AWS X-Ray Continuous Integration (CI) Continuous Deployment (CD) Continuous Delivery (CD) Manual Approval Cloud With Raj www.cloudwithraj.com

Page 14

Kubernetes Tools Ecosystem with AWS Cloud Implementation Observability Scaling Delivery/Automation Security Cost Optimization Amazon EKS Cloud With Raj www.cloudwithraj.com Prometheus Grafana Fluentbit Jaeger ADOT CloudWatch Karpenter AutoScaling Argo Terraform Jenkins Github Actions Gitlab CICD Gatekeeper Trivvy ECR Scan GuardDuty Kube Bench Secrets Manager Istio CloudWatch Container Insights Cost and Usage Report (New feature - Split Cost Allocation)  Kubecost X-Ray

Page 15

Code & Dockerfile Manifests Git Repo Amazon ECR Container Image CI Tool CD Tool CD Tool Manifests updated with container image tag 1 2 Amazon EKS Traditional CICD Vs. GitOps Code & Dockerfile Manifests Git Repo Amazon ECR Container Image CI Tool CD Tool GitOps Tool Installed in Cluster Manifests updated with container image tag A B Amazon EKS Checks for difference between cluster and Git Pulls in changed files Traditional CICD GitOps 3 C Pushes files Traditional DevOps Step 1: Developers check in Code, Dockerfile, and manifest YAMLs to an application repository. CI tools (e.g., Jenkins) kick off, build the container image and save the image in a container registry such as Amazon ECR. Step 2: CD tools (e.g. Jenkins) update the deployment manifest files with the tag of the container image. Step 3: CD tools (e.g. Jenkins) execute the command to deploy the manifest files into the cluster, which, in terms, deploys the newly built container in the Amazon EKS cluster. Conclusion - Traditional CICD is a push based model. If a sneaky SRE changes the YAML file directly in the cluster (e.g. changes number of replica, or even the container image itself!), the resources running in the cluster will deviate from what's defined in the YAML in the Git. Worse case, this change can break something, and DevOps team need to rerun part of the CICD process to push the intended YAMLs to the cluster GitOps Step A: Developers check in Code, Dockerfile, and manifest YAMLs to an application repository. CI tools (e.g., Jenkins) kick off, build the container image and save the image in a container registry such as Amazon ECR. Step B: CD tools (e.g. Jenkins) update the deployment manifest files with the tag of the container image. Step C: With GitOps, Git becomes the single source of truth. You need to install a GitOps tool like Argo inside the cluster and point to a Git repo. Git keeps checking if there is a new file, or if the files in the cluster drifts from the ones in Git. As soon as YAML is updated with new container image, there is a drift between what's running in the cluster vs what's in Git. ArgoCD pulls in this updated YAML file and deploys new container. Conclusion - GitOps does NOT replace DevOps. As you can see GitOps only replaces part of the CD process. If we think about the previous scenario where the sneaky SRE directly changes the YAML in cluster, ArgoCD will detect the mismatch between the changed file vs the one in Git. Since there is a difference, it will pull in the file from Git and bring Kubernetes resources to it's intended state. And don't worry, Argo can also send a message to the sneaky SRE's manager ;).  Cloud With Raj www.cloudwithraj.com

Page 16

Developer Requests Infrastructure Ticketing System Platform Team Infra as Code (IaC) (Terraform, CDK etc.) Code & Dockerfile Manifests Git Repo Amazon ECR Container Image CI Tool CD Tool CD Tool Manifests updated with container image tag Container deployed 1 2 4 5 6 3 Amazon EKS Platform Team and Developer Team Recently, the term "platform team" has been floating around plenty. But what do platform team do? How are they different from the developer team? Let's understand with the diagram below: Step 1: The developer team requests the Platform team to provision appropriate AWS resources. In this example, we are using Amazon EKS for the application, but this concept can be extended to any other AWS service. This request for AWS resources is typically done via the ticketing system. Step 2: The platform team receives the request. Step 3: The platform team uses Infrastructure as Code (IaC), such as Terraform, CDK, etc., to provision the requested AWS resources, and share the credentials with the Developer team. Step 4: The developer team kicks off the CICD process. We are using a container process to understand the flow. Developers check in Code, Dockerfile, and manifest YAMLs to an application repository. CI tools (e.g., Jenkins, GitHub actions) kick off, build the container image and save the image in a container registry such as Amazon ECR. Step 5: CD tools (e.g. Jenkins, Spinnaker) update the deployment manifest files with the tag of the container image. Step 6: CD tools execute the command to deploy the manifest files into the cluster, which, in terms, deploys the newly built container in the Amazon EKS cluster. Conclusion - The platform team takes care of the infrastructure (often with the guardrails) appropriate for the organization, and the developer team uses that infrastructure to deploy their application. The platform team does the upgrade and maintenance of the infrastructure to reduce the burden on the developer team. Cloud With Raj www.cloudwithraj.com

Page 17

Scaling Difference Between Lambda, EC2, EKS Cloud With Raj www.cloudwithraj.com EC2 Auto Scaling Group Lambda EKS EC2 EC2 Lambda Lambda EC2 Auto Scaling Group EC2 EC2 EC2 Scaling: You need to use a Auto Scaling Group (ASG) and define on what EC2 metric you want it to scale e.g. CPU utilization. You can use ASG's "minimum number of instances" to run certain number of instances on all time. Recently ASG also supports scheduled scaling, and warm pool. Lambda Scaling: No ASG needed. For each incoming connection, Lambda automatically scales. Consider increasing the concurrency setting for Lambda as needed. Implement Provisioned Concurrency to keep certain number of Lambda pre-warmed. This can be done either with schedule or based on Provisioned Concurrency utilization. EKS Scaling: EKS scaling is the most complex scaling. You may or may NOT use a Auto Scaling Group but it does NOT work like regular EC2 scaling. Please refer to the next page to learn about EKS (Kubernetes/K8s) scaling in detail

Page 18

1 Worker VM Worker VM Worker VM Worker VM Worker VM Node Autoscaler Pending Unschedulable 2 3 4 How Does Kubernetes Worker Nodes Scale? Correct Answer: Step 1: You configure HPA (Horizontal Pod Autoscaler) to increase the replica of your pods at a certain CPU/Memory/Custom Metrics threshold. Step 2: As traffic increases and the pod metric utilization crosses the threshold, HPA increases the number of pods. If there is capacity in the existing worker VMs, then the Kubernetes kube-scheduler binds that pod into the running VMs. Step 3: Traffic keeps increasing, and HPA increases the number of replicas of the pod. But now, there is no capacity left in the running VMs, so the kube-scheduler can't schedule the pod(yet!). That pod goes into pending, unschedulable state Step 4: As soon as pod(s) go to pending unschedulable state, Kubernetes node scalers (such as Cluster Autoscaler, Karpenter etc.) provisions a new node. Cluster Autoscaler requires an Auto Scaling Group where it increases the desired VM count, whereas Karpenter doesn't require an Auto Scaling Group or Node Group. Once the new VM comes up, kube-scheduler puts that pending pod into the new node. Incorrect Answer: Set Auto Scaling Groups to scale at a certain VM metric utilization like scaling regular VMs. Cloud With Raj www.cloudwithraj.com

Page 19

Gen AI 4 Layers Silicon Chips (E.g. AMD, NVIDIA) LLM Models (E.g. Open AI, Anthropic, DeepSeek) Infrastructure Providers to Host/Train LLMs (E.g. AWS, Azure, GCP) Applications (E.g. Adobe Firefly, LLM Chatbots) HARD EASY Learning Curve Opportunity For New Market Players Most amount of jobs (MLOps, LLM with Kubernetes/Serverless, Cloud LLM services etc.) Cloud With Raj www.cloudwithraj.com Gen AI hype is at an all-time high, and so is the confusion. What do you study, how do you think about it, and where are the most jobs? These are the burning questions in our minds. Gen AI can be broken down into the following four layers. 1. The bottom layer is the hardware layer, i.e., the silicon chips that can train the models. Example - AMD, NVIDIA 2. Then comes the LLM models that get trained and run on the chips. Examples are Open AI, Anthropic etc. 3. Then comes infrastructure providers who provide an easier way to consume, host, train, inference the models. Example is AWS. This layer consists of managed services such as Amazon Bedrock, which hosts pre- trained models, or provision VMs (Amazon EC2) where you can train your own LLM 4. Finally, we have the application layer which uses those LLMs. Some examples are Adobe Firefly, LLM chatbots, LLM travel agents etc. Now, the important part - as you go from the bottom to the top, the learning curve gets easier, and so does the opportunity for new market players to enter. Building new chips requires billions of dollars of investments, and hence, it's harder for new players to enter the market. The most opportunities are in the top two layers. If you already know the cloud, then integrating Gen AI with your existing knowledge will increase your value immensely. If you are working in DevOps, learn MLOps; if you know K8s/Serverless, learn how you can integrate Gen AI with those; if you work in an application; integrate with managed LLM services to enhance functionality, you got the idea!

Page 20

Prompt Engineering Vs RAG Vs Fine Tuning Amazon Bedrock (Hosts LLM) Prompt Engineering Prompt  Subpar Response  Enhanced Prompt  Better Response 1. You send a prompt to the LLM (hosted in Amazon Bedrock in this case), and get a response which you are not satisfied with 2. You enhance the prompt, and finally come up with a prompt that gives desired better response    1 2 Prompt that can be enhanced by company data 1 RAG (Retrieval Augmented Generation) Code/Jupyter Notebook/App Company data Embeddings Vector Database Search Vector DB with the prompt  Retrieve relevant info related to prompt Amazon Bedrock (Hosts LLM) Augment original prompt with retrieved info Generated answer Generated answer 2 3 4 5 1. RAG (Retrieval Augmented Generation) is used where the response can be made better by using company specific data that the LLM does NOT have. You store relevant company data into a vector database. This is done by a process called embeddings where data is transformed into numeric vectors 2. User gives a prompt which can be made better by adding company specific info 3. A process (code/jupyter notebook/application) converts the prompt into vector and then search the vector database. Relevant info from the vector database is RETRIEVED (First Part of RAG) and returned 4. The original prompt is AUGMENTED (Second part of RAG) with this company specific info and sent to LLM 5. LLM GENERATES (Last part of RAG) the response and sends back to the user Fine Tuning Base LLM Fine Tuned LLM Task-specific training dataset Prompt for organization use case Response 1. If you need a LLM which is very specific to your company/organization's use case that RAG can't solve, you train the base LLM with large training dataset for the tasks. The output is a fine tuned LLM. 2. User asks question related to the use case and gets answer Cloud With Raj www.cloudwithraj.com

Page 21

Prompt that can be enhanced by company data 1 Code/Jupyter Notebook/App Company data Embeddings Bedrock Knowledge Base Search Vector DB with the prompt  Retrieve relevant info related to prompt Amazon Bedrock LLMs Augment original prompt with retrieved info Generated answer Generated answer 2 3 4 5 RAG (Retrieval Augmented Generation) is used where the response can be made better by using company specific data that the LLM does NOT have. Amazon BedRock makes it very easy to do RAG. Below are the steps: 1. You store relevant company data into a S3 bucket. Then from BedRock Knowledge bases, you select an emedding LLM (Amazon Titan Embed or Cohere Embed) which converts the S3 data into embeddings (vector). Knowledge Base can also create a serverless vector store for you to save those embeddings. Alternatively you can also bring your own Vector Database (OpenSearch, Aurora, Pinecone, Redis)    2. User gives a prompt which can be made better by adding company specific info 3. A process (code/jupyter notebook/application) converts the prompt into vector and then search the vector database. Relevant info from the vector database is RETRIEVED (First Part of RAG) and returned 4. The original prompt is AUGMENTED (Second part of RAG) with this company specific info and sent to another Bedrock LLM  5. BedRock LLM GENERATES (Last part of RAG) the response and sends back to the user Cloud With Raj www.cloudwithraj.com S3 OpenSearch Serverless (Vector Store) OR (BYO Vector Store) Bedrock LLM to convert data to embedding RAG (Retrieval Augmented Generation) with Amazon BedRock

Page 22

EKS Upgrade Simplified Using Karpenter Cloud With Raj www.cloudwithraj.com EKS Control Plane EC2 (EKS-Optimized AMI for v1.27) EC2 (EKS-Optimized AMI for v1.27) EKS Data Plane Systems Manager Parameter Store (EKS-Optimized AMI list for all EKS versions) Karpenter (Running in Data Plane) Gets latest AMI for v1.27 EKS Version 1.27 Provisions EC2s with latest AMI for v1.27 EKS Control Plane EC2 (EKS-Optimized AMI for v1.28) EC2 (EKS-Optimized AMI for v1.28) EKS Data Plane Systems Manager Parameter Store (EKS-Optimized AMI list for all EKS versions) Karpenter (Running in Data Plane) Gets latest AMI for v1.28 EKS Version 1.28 Recycles Worker Nodes with v1.28 AMI in Rolling Deployment Fashion Updates Control Plane to v1.28 New EKS Version Released 1 2 3 EKS upgrade can be tedious. But Karpenter can automatically upgrade your Data Plane worker nodes reducing your burden. Here's how: a. EKS Optimized AMI IDs are listed in AWS Systems Manager parameter store. CNCF project Karpenter, the next gen cluster autoscaler, periodically checks this, and reconciles with the running worker nodes to see if they are running with the latest EKS- Optimized AMI for the particular EKS version. In this case, let's assume EKS is running with v1.27. b. At a certain point, EKS releases next version 1.28. And the below workflow takes place:  1. Admin upgrades the EKS control plane to v1.28.  2. Following the previous logic, Karpenter retrieves the latest AMI for v1.28 and check if worker nodes are running with those. They are NOT, so a Karpenter Drift is triggered.  3. To fix this Drift, Karpenter automatically updates the worker nodes to v1.28 AMIs. And it does it using rolling deployment (i.e. a new node comes up, existing node cordoned and drained, then terminated). It also respects the Kubernetes eviction API parameters, such as maintaining PDB. If you want to know the process in detail, including with custom AMIs, check out the Karpenter drift blog - https://aws.amazon.com/blogs/containers/how-to-upgrade-amazon-eks-worker-nodes-with-karpenter-drift/

Page 23

EKS Upgrade Using Karpenter - Automatic and Manual Cloud With Raj www.cloudwithraj.com EKS Control Plane EC2 (EKS-Optimized AMI for v1.28) EC2 (EKS-Optimized AMI for v1.28) EKS Data Plane Systems Manager Parameter Store (EKS-Optimized AMI list for all EKS versions) Karpenter (Running in Data Plane) Gets latest AMI for v1.28 EKS Version 1.28 Recycles Worker Nodes with v1.28 AMI in Rolling Deployment Fashion Updates Control Plane to v1.28 from v1.27 1 2 3 Karpenter can automatically upgrade your K8s worker nodes. What if you don't want to - that's possible too. Here's how: You can pin your worker nodes to a specific version of the AMI using Karpenter EC2NodeClass. For example, in the below diagram, Karpenter will provision the worker nodes with Amazon EKS BR Optimized Linux AMI for version v1.20.5.  Even after AWS releases a new AMI or you upgrade your EKS Control plane, Karpenter will NOT upgrade data plane. Remember that - EKS Data Plane can run with AMIs 3 versions behind the Control Plane. Though it is recommended to run with latest version due to security patches.  The general practice I see for critical projects with my customers is - that they will let Karpenter automatically upgrade to the latest AMI version in dev/test and pin to a specific AMI in prod. Once the newest version is tested in dev/test, production EC2NodeClass will be changed to point to this new version. Once EC2NodeClass is updated, Karpenter will upgrade the worker node AMIs in a rolling deployment fashion.    Automatic Manual (More Control) EKS Control Plane EC2 (EKS-Optimized Bottlerocket AMI v1.20.5) EKS Data Plane Karpenter (Running in Data Plane) EKS Version 1.28 Worker nodes are pinned to specific version and NOT auto-upgraded 1 apiVersion: karpenter.k8s.aws/v1 kind: EC2NodeClass metadata:   name: default spec:   amiSelectorTerms:  - alias: bottlerocket@latest apiVersion: karpenter.k8s.aws/v1 kind: EC2NodeClass metadata:   name: default spec:   amiSelectorTerms:  - alias: bottlerocket@v1.20.5 The "alias: bottlerocket@latest" in EC2NodeClass ensures Karpenter will always automatically upgrade your worker nodes if AWS releases a new AMI OR you upgrade your EKS Control Plane    EC2 (EKS-Optimized Bottlerocket AMI v1.20.5)

Page 24

Container Lifecycle - Local to Cloud Developer Code & Dockerfile Manifest with image url (In Local Machine) Amazon ECR Container Image Amazon EKS Test Container in Local Machine docker build docker run docker push kubectl apply Cloud With Raj www.cloudwithraj.com The fundamental container workflow from local machine to cloud is below: 1. Developer writes code, and associated Dockerfile to containerize the code in her local machine 2. She uses "Docker build" command to create the container image, in her local machine. At this point container image is saved in the local machine 3. Developer uses "Docker run" command to run the container image, and test out the code running from the container. Developer can repeat Steps 1-3, till the testing goes as per the requirements 4. Next, developer runs "Docker push" command to push the container image from the local machine to a container registry. Some examples are DockerHub, or Amazon ECR. 5. Finally, using "Kubectl apply" command, an YAML manifest which has the URL of the container image from the Amazon ECR, is deployed into the running Kubernetes cluster. Note that, this is to understand the lifecycle of the container. In real-world, after testing is done in local machine, the following steps are automated. Refer to the "Traditional CICD vs GitOps" page for that workflow

Page 25

API Gateway Lambda 1 Based on values in the message, EventBridge can fire different targets Event Store + Router EventBridge Step Function ECS Fargate Rule 1 Rule 2 Rule 3 Gen AI Multi Model Invocation Sagemaker Jumpstart Bedrock Bedrock Invokes LLM A Invokes LLM B Invokes LLM C Cloud With Raj www.cloudwithraj.com

Page 26

Git Workflow Cloud With Raj www.cloudwithraj.com git add git commit git push Local Machine Remote Repository (E.g. GitHub, GitLab) Local IDE (E.g. Visual Studio Code) file1 file1 file1 file1 Index/Staging Area Local Repository (Works like a database) Git and GitHub for Beginners Crash Course (Click on the YouTube icon):

Page 27

DevSecOps Workflow Cloud With Raj www.cloudwithraj.com Author Source Build Test Deploy Monitor Use AWS Secrets Manager How is the app exposed? Use private subnet, AWS WAF, AWS Shield, AuthN/Z Static code analysis Sample tools - SonarQube, graudit Lint Infra as Code VS Code AWS CodeCommit GitHub AWS CodeBuild Jenkins AWS CodeBuild Jenkins Jenkins AWS CodeDeploy Amazon  CloudWatch AWS X-Ray Security Embedded Throughout The Pipeline Penetration testing DDoS Testing Fault injection simulation Dynamic testing and analysis Sample tools - Astra, Invicti Monitor host

Page 28

What Happens When You Type An URL Cloud With Raj www.cloudwithraj.com Types www.amazon.com Checks local machine caches for IP Address of the URL If NOT cached DNS root name server Name server for .com TLD Amazon Route 53 Get IP address (LB of URL frontend) DNS Resolver EC2 Hosting Amazon.com EC2 Hosting Amazon.com Obtained IP address Front end pages Free video explaining steps in detail: Click 1 2 3 4 5 Load Balancer (LB) 6 Page rendered and displayed to user

Page 29

The operational excellence pillar focuses on running and monitoring systems, and continually improving processes and procedures. Key topics include automating changes, responding to events, and defining standards to manage daily operations. Cloud With Raj www.cloudwithraj.com Well Architected Framework Operational Excellent Pillar Security Pillar Reliability Pillar Performance Efficiency Pillar Cost Optimization Pillar Sustainability Pillar The security pillar focuses on protecting information and systems. Key topics include confidentiality and integrity of data, managing user permissions, and establishing controls to detect security events. The reliability pillar focuses on workloads performing their intended functions and how to recover quickly from failure to meet demands. Key topics include distributed system design, recovery planning, and adapting to changing requirements. The performance efficiency pillar focuses on structured and streamlined allocation of IT and computing resources. Key topics include selecting resource types and sizes optimized for workload requirements, monitoring performance, and maintaining efficiency as business needs evolve. The cost optimization pillar focuses on avoiding unnecessary costs. Key topics include understanding spending over time and controlling fund allocation, selecting resources of the right type and quantity, and scaling to meet business needs without overspending. The sustainability pillar focuses on minimizing the environmental impacts of running cloud workloads. Key topics include a shared responsibility model for sustainability, understanding impact, and maximizing utilization to minimize required resources and reduce downstream impacts.  The AWS Well-Architected Framework describes key concepts, design principles, and architectural best practices for designing and running workloads in the cloud. By answering a few foundational questions, learn how well your architecture aligns with cloud best practices and gain guidance for making improvements. AWS Well-Architected Framework assessment questions can be answered inside your AWS account, and then a scorecard can be obtained that evaluates your application in respect to the below six pillars

Page 30

Cloud With Raj www.cloudwithraj.com AWS Migration Tools Oracle Servers Shared File System AWS Database Migration Service (DMS) Amazon Aurora AWS Application Migration Service Amazon EC2 AWS DataSync Amazon S3 Amazon EFS AWS Migration Hub AWS Direct Connect (or VPN or Internet) AWS Application Discovery Service  (Discovers on premises applications)

Page 31

Cloud With Raj www.cloudwithraj.com Multi-site Active Active Availability Zone VPC Region A Availability Zone Route 53 Auto Scaling Group App server Elastic Load Balancer App server DynamoDB DynamoDB continuous backup Availability Zone VPC Availability Zone Elastic Load Balancer DynamoDB DynamoDB continuous backup Region B DynamoDB global table replication AWS Cloud Geolocation/Latency Routing Auto Scaling Group App server App server

Page 32

Developer Code & Dockerfile Manifest with image url (In Local Machine) Amazon ECR Container Image Amazon EKS (Control Plane) Cloud With Raj www.cloudwithraj.com Kubernetes Node Pod Container Relationship EKS Worker Node Container Stored EKS Worker Node 1. Developer creates the container image which gets deployed on Amazon EKS. 2. The container image runs inside Kubernetes Pod. Pod is the minimum deployable unit in Kubernetes. A container can NOT run without being inside a pod 3. Pod runs inside EC2 worker nodes. There can be multiple pods running inside one EC2. 4. Easy way to remember this is "NPC" (like in video games) = Node - Pod - Container in that order i.e. Node hosts one or many pods. One pod hosts one or many containers

Page 33

Cloud With Raj www.cloudwithraj.com Batch Workloads on AWS AWS Batch Amazon EventBridge Scheduler Amazon ECS Amazon EKS Fargate EC2 Amazon ECR 1. Use EventBridge scheduler to schedule jobs. You can also trigger jobs based on EventBridge rules which give you great flexibility and power! 2. Then the batch job is submitted via AWS Batch. Obviously you are thinking, I see container image and then ECS/EKS, why can't I submit a container job directly from EventBridge without AWS Batch. it's because AWS Batch maintains job queues, restart, resource management etc. that is not available if you skip it. 3. The actual steps of the job needs to be in a container 4. The job running in a container gets submitted in either ECS or EKS. AWS Batch supports both EC2 and Fargate

Page 34

Cloud With Raj www.cloudwithraj.com STAR Interview S T A R SITUATION background of the project.  TASK goals that you need to achieve ACTION steps that you took RESULT what you have achieved  One of the biggest mistake people make in behavioral interviews is, they keep on saying "we" - "we did this task", "we came up with a plan", "we completed step X". This may sound weird - in Amazon we expect you to be a team player, but in interview we want to know precisely what YOU did. Think of this as a player selection in your superbowl team. Yes, we want the receiver to play well with the team, but while selecting, we only care about his stats and his abilities. So make sure to clarify what part YOU played in your answers. Next biggest mistake people do is they talk in hypotheticals. When a question starts with "Tell me a time when you..", you must give example from your past projects. If you just answer in hypotheticals such as "I will do this, that..", you will fail the interview. Okay, now let's look at a sample question and answer. Q: Tell me about one difficult project that you delivered. What was the difficulty, and how did you determine the course of action? What was the result? A: I migrated our project to AWS Serverless after considering K8s and EC2. We coded the lambda, tested it, implemented DevOps then deployed into prod and it was a huge success. Is the above answer good or bad? It's quite bad, why? • Situation and Task not described • “We” - what actions did YOU perform? • “Huge Success” - No Data, Very Subjective A good answer may look like this: Situation - We have 20 Microservices running on-prem on PCF. PCF license needed to be renewed in 6 months, leadership wanted the project to migrate to AWS before that to save cost and increase agility. Task - As a lead architect/developer/techlead, I was tasked to find out suitable AWS solution, within the timeframe given. Action - I researched possible ways to run Microservices on AWS. I narrowed it down to three options - run each microservice on vanilla EC2, or run on K8s using EKS, or Serverless. I took one of the microservices and did POC on Vanilla EC2, EKS and Lambda-API Gateway. While they all did the job, I found that with EC2 I have to take care of making it HA by spinning multiple EC2 in multiple AZs, and there is overhead of AMI rehydration. EKS seems to be a valid solution. However, given the traffic patterns, we have to pay more than necessary. There is also an overhead of training the team on K8s. Lambda-API Gateway is inherently HA, scalable, and pay what we use and no server to manage at all. This simplifies our day 2 operational overhead and let us focus on delivering business value. Result - Based on all the POC data of performance, cost and time to deploy, I selected Serverless solution. We converted rest of the microservices to Lambda and implemented in production within 3 months. It resulted in over 90% cost savings over EC2 and K8s. I shared my project learnings with other teams and showed them how to code Lambda so they can utilize it as well. I got recognized by CIO for this effort. Why is this answer good? • Situation, Task, Action, Results are clearly defined • Gives details on what I did • Result has data, and not just "huge success" Note that you will get follow-up questions on the answer to understand your depth and to make sure you are not just copying and pasting a generic answer from the Internet 😅.

Page 35

Cloud With Raj www.cloudwithraj.com Hybrid Cloud Architecture Hybrid means AWS and Data Center working together for the application + Site to Site VPN Or Direct Connect On-Prem Databases (Will stay on-premises till all apps move to AWS, or perhaps due to regulatory requirements)  Amazon EC2 (Application code moved to AWS) ALB Route 53 User

Page 36

Cloud With Raj www.cloudwithraj.com Karpenter Consolidation Saving Money 1 Worker VM 2 Worker VM Worker VM Worker VM Underutilized Nodes kind: NodePool spec:   disruption:     consolidationPolicy: WhenEmptyOrUnderutilized Worker VM Worker VM Worker VM Worker VM Pods Consolidated (Binpacked) Worker VM Worker VM Unused nodes terminated == Significant savings 3 4 Enable Consolidation at Karpenter NodePool YAML To Know More: Get Highest Rated Karpenter Masterclass Course in Udemy (Click the image)

Page 37

Cloud With Raj www.cloudwithraj.com API Gateway Auth  API Key Lambda API Gateway DynamoDB Static key sent on header IAM Lambda API Gateway DynamoDB Access key Secret access key IAM Validates IAM credentials (Generate static key) Cognito User Pool Lambda API Gateway DynamoDB Cognito  Userid, pwd Exchange userid, pwd for temporary JWT token JWT token Validates token Cognito User Pool with Federated Identities Lambda API Gateway DynamoDB Cognito User Pool  Authenticate Exchange third party userid, pwd for temporary JWT token Send IAM creds Exchange token with IAM creds IAM Cognito Federated Identiy Validate IAM creds Third Party Identity Provider (IdP) Lambda API Gateway DynamoDB Identity Provider Exchange IdP userid, pwd for temporary JWT token Send token IAM Lambda Authorizer Validates token with IdP, issues IAM creds Validate IAM creds Total of 5 Auth Mechanism 1 2 1 2 3 3 1 2 3 4 1 2 3 4 6 5 7 1 2 3 4 5

Page 38

Cloud With Raj www.cloudwithraj.com Serverless Web Application S3  (Static website - CSS, JS, images) Lambda API Gateway DynamoDB CloudFront (Handles authentication and authorization, throttling, DDOS protection, and more) API invoked for additional info (e.g. a button is clicked on the site) Route 53 (Assign readable domain)

Page 39

Cloud With Raj www.cloudwithraj.com EDA (Event Driven Architecture) with Kubernetes SQS Event Store Application Scales HPA based on number of messages in queue Pending Unschedulable Pods of Message Processing App Karpenter provisions nodes in response to pending pods Worker VM Kubernetes Cluster Karpenter can be used with other CNCF projects to deliver powerful solutions for common use cases. One prominent example of this is using Kubernetes Event Driven Autoscaling (KEDA) with Karpenter to implement event driven workloads.  With KEDA, you can drive the scaling of any container in Kubernetes based on the number of events needing to be processed. One popular implementation is to scale up worker nodes to accommodate pods that process messages coming into a queue: 1. Application inserts messages into the queue 2. KEDA monitors queue depth, and scales HPA for the application for processing the messages 3. HPA increases number of pods. Assuming, no capacity available to schedule those pods, Karpenter preovisions new nodes 4. Kube-scheduler places those pods on the VMs. The pods process the messages from queue 5. Once processing is done, number of pods go to zero. Karpenter can scale down VMs to zero for maximum cost efficiency

Page 40

Cloud With Raj www.cloudwithraj.com EDA (Event Driven Architecture) with SNS SQS Kubernetes SQS1 Event Store Application Scales HPA based on number of messages in queue Pending Unschedulable Pods of Message Processing App Karpenter provisions nodes in response to pending pods Worker VM Kubernetes Cluster SNS Payload based filtering & routing SQS2 Event Store Lambda DynamoDB

Page 41

Cloud With Raj www.cloudwithraj.com EKS Autonomous Mode Managed EC2 Control Plane (Managed by AWS) Karpenter Ingress Storage AWS makes it Scalable, HA, Secure Managed EC2 Managed EC2 Data Plane Core DNS, Kube Proxy, CNI runs as processes, baked into the AWS managed AMI - AWS Manages core addons like Karpenter, Ingress, and EBS    - AWS makes them secure, scalable, HA just like EKS control plane components    - AWS manages the version - AWS Manage AMIs    - AMI is based on Bottlerocket with Core DNS, Kube-Proxy, VPC CNI components baked       in Managed worker nodes are EC2s, that means you can run pretty much anything that you can run on regular Kubernetes This is a big one - EKS Auto supports daemonsets, so you can run your favorite tools and agents Use Reserved Instances, Savings Plan, Spot, Graviton with EKS Auto Amazon EKS

Page 42

Cloud With Raj www.cloudwithraj.com Amazon Aurora Global Database Vs. DSQL Availability Zone VPC Region A Availability Zone Route 53 Auto Scaling Group Elastic Load Balancer App Server Amazon Aurora DSQL (Multi-AZ) Availability Zone VPC Availability Zone Auto Scaling Group Elastic Load Balancer App Server Region B Both regions can accept writes, unlike Global Database AWS Cloud Geolocation/Latency Routing App Server App Server Amazon Aurora DSQL (Multi-AZ) Amazon Aurora released DSQL, which may seem similar to Aurora Global Database. What are the similarities and differences? 1. Amazon Aurora, regardless of using Global Database, or DSQL, stores data in multi-AZ within a single region. In the unlikely event of a region failure, your data is highly available 2. The main difference is - Global Database replicates data from one region to another, and only one region can write to the table. The secondary region acts as a reader. Only when the primary region fails, the secondary region table get promoted to the primary (writer) instance. Whereas in DSQL it is Active-Active i.e. both regions can accept writes and cross-replicate. DSQL also stores the transaction logs that can be used to recover lost transactions during a disaster. 3. Aurora Global Database is supported for MySQL and PostgreSQL, and DSQL public preview is available for PostgreSQL 4. Till now, Aurora can autoscale Aurora Replicas for reading only. But DSQL can horizontally scale both reads and writes (Compute and Storage) which will be a big leap and a game changer for SQL databases.

Page 43

Cloud With Raj Amazon S3 Tables with Apache Iceberg  www.cloudwithraj.com AWS Glue (S3 Table Catalog) Automatic Catalog Amazon EMR Amazon Athena Amazon Redshift Amazon QuickSight Amazon Data Firehose S3 Tables Analytics using various services Simple SQL Query R/W via Apache Iceberg Standard Iceberg is a high-performance format for huge analytic tables. Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables, at the same time. - Prior to this announcement, you could save data as Iceberg tables in General S3 bucket which has overhead (use Data Catalog in Lake Formation) - With this announcement - Define tables easily and much more performant    - 10X TPS, 3X faster query performance compare to Iceberg data stored in general buckets - Automatically registered to Glue data catalog   - Catalog can be used by Amazon EMR (Spark), Athena, Redshift, Quicksight, Data Firehose - Fully managed Apache Iceberg Tables in S3 - each table gets an ARN (Amazon Resource Name)   - Optimized performance, security controls, cost optimization   - Automatic compaction - Run simple SQL query - Applications read and write data via Apache Iceberg Standard

Page 44

Docker Vs Kubernetes Developer Code & Dockerfile Manifest with image url (In Local Machine) Amazon ECR Docker Image Test Container in Local Machine docker build docker run docker push kubectl apply Cloud With Raj www.cloudwithraj.com Amazon EKS Kubernetes Files, libraries, dependencies packaged together File with commands to create docker container Kubernetes is a container orchestrator Manages auto healing, scaling, networking between multiple replicas of the running docker container images EC2 Worker Node EC2 Worker Node Docker Vs Kubernetes  1. The term docker is used in lot of places in container lifecycle. Hence let's understand with the flow above 2. The two primary areas the term "Docker" is associated are - Dockerfile, and mostly Docker Image (also known as Docker container image, or just container image) 3. Dockerfile is a file with commands that packages up the code, libraries, and dependencies in to the Docker container image 4. This docker image is just a single copy of your application, that is often tested in your local machine (note the single docker container image on the laptop!). This image is stored in a registry like Amazon ECR 5. Finally, using "Kubectl apply" command, an YAML manifest which has the URL of the container image from the Amazon ECR, is deployed into the running Kubernetes cluster.  6. Running a single docker container is easy, but when you need to run many copies (known as replicas in container lingo) of your container image, there needs to be a control plane managing different things like - when one running container dies, provision another in it's place, scale the replicas, networking between them and more. This is exactly what Kubernetes does.

Page 45

System Design Trade-Offs Cloud With Raj www.cloudwithraj.com Increased  Reliability Price Availability Zone App server App server Availability Zone Region A Replicate infra in another region or more AZs - increases cost Region A Control Management Overhead Lambda Amazon EC2 AWS manages underlying infra Customer manages VM Lightsail Kubernetes Easy Way EKS Auto Container in managed VM ... ... Other Other Consistency Performance DynamoDB Microseconds response time Amazon RDS Follows ACID properties

Page 46

Top 3 Popular Designs Cloud With Raj www.cloudwithraj.com Lambda Database Amazon DynamoDB Synchronous Microservice API Gateway SQS Lambda Event Store /buy (POST) (processes messages from SQS for cloudwithraj.com/buy (POST)) ) Database Amazon DynamoDB Websocket API Event Driven Architecture API Gateway /buy (POST) (Handles traffic of cloudwithraj.com/buy (POST)) Ingress /buy Target Group 1 /browse Target Group 2 Database Amazon Aurora              Path Based Routing Domain:  cloudwithraj.com Deployment for /buy Deployment for /browse Database Amazon DynamoDB              ALB (url: cloudwithraj.com) Amazon EKS Amazon EKS Kubernetes Ingress Route 53

Page 47

3 Tier Architecture with Microservice EC2 Webserver External Facing ALB Internal ALB Database Amazon Aurora EC2 Webserver EC2 Appserver EC2 Appserver Auto Scaling Group Auto Scaling Group Availability Zone 1 Availability Zone 1 Availability Zone 2 Availability Zone 2 PRESENTATION LAYER APPLICATION LAYER DATABASE Cloud With Raj www.cloudwithraj.com EC2 (Running Code)              ALB Database Amazon Aurora EC2 (Scaled Up) Auto Scaling Group /browse Target Group 1 Lambda /buy Target Group 2 /* (Catch all) Target Group 3 (Handles traffic of cloudwithraj.com/browse) (Handles traffic of cloudwithraj.com/buy) (Handles traffic of anything else) Database Amazon DynamoDB Database Amazon Aurora Microservice 1 Microservice 2 Microservice 3 EKS              Path Based Routing Three-Tier Design Detailed Microservice Design

Page 48

Developer Code & Dockerfile Manifest with image url (In Local Machine) Amazon ECR App Container Image Cloud With Raj www.cloudwithraj.com Pod Container Sidecar Container Stored App Container Image 10.15.25.215 App Container Sidecar Container Exposed at port 80 10.15.25.215:80 1. In Kubernetes container(s) run inside pod 2. Generally, inside one pod, one application container runs 3. But sometimes, another container runs inside the same pod along with application container called sidecar containers. Sidecar containers have their own independent lifecycles. They can be started, stopped, and restarted independently of app containers.This means you can update, scale, or maintain sidecar containers without affecting the primary application. 4. Each pod has an IP address, all containers inside the pod share the same IP address. For that reason, application container is exposed using a port. In this example the pod IP is 10.15.25.215, and the app container is exposed at port 80. 5. One popular example of sidecar container is Istio service mesh which runs inside each pod alongside app container

Page 49

Cloud With Raj www.cloudwithraj.com Control Karpenter Consolidation 1 Worker VM 2 Worker VM Worker VM Worker VM Underutilized Nodes kind: NodePool spec:   disruption:     consolidationPolicy: WhenEmptyOrUnderutilized     consolidateAfter: 30m Worker VM Worker VM Worker VM Worker VM Pods Consolidated (Binpacked) Worker VM Worker VM Unused nodes terminated == Significant savings 3 4 Enable Consolidation at Karpenter NodePool YAML with consolidateAfter Put value 'Never' to stop consolidation WAIT 30 mins from last pod added or deleted to consolidate

Page 50

Cloud With Raj www.cloudwithraj.com Cross-Account EventBridge Target API Gateway SQS B Based on values in the message, EventBridge can fire different targets EventBridge Rule 1 Rule 2 AWS Account A (Team A) Lambda B AWS Account B (Team B) AWS Account C (Team C) Lambda C

Page 51

Amazon EKS  Cloud Implementation Observability Scaling Delivery/Automation Security Cost Optimization Prometheus Grafana Fluentbit Jaeger ADOT CloudWatch Karpenter AutoScaling Argo Terraform Jenkins Github Actions Gitlab CICD Gatekeeper Trivvy ECR Scan GuardDuty Kube Bench Secrets Manager Istio CloudWatch Container Insights Split Cost Allocation  Kubecost X-Ray Cloud With Raj www.cloudwithraj.com Kubernetes Tech Stack

Page 52

Cloud With Raj www.cloudwithraj.com API Gateway, Lambda, AWS Service  DynamoDB Use Lambda to Transform NOT Transport API Gateway Lambda Is Lambda inserting/reading data without any business logic or data manipulation? DynamoDB API Gateway API Gateway can interact with many AWS services directly

Page 53

Microservices Tech Stack EC2 (Running Code)              ALB (url: cloudwithraj.com) Database Amazon Aurora EC2 (Scaled Up) Auto Scaling Group Cloud With Raj www.cloudwithraj.com              Domain: cloudwithraj.com /browse Target Group 1 Lambda /buy Target Group 2 /* (Catch all) Target Group 3 (Handles traffic of cloudwithraj.com/browse) (Handles traffic of cloudwithraj.com/buy) (Handles traffic of anything else) Database Amazon DynamoDB Database Amazon Aurora Microservice 1 Microservice 2 Microservice 3 EKS              Path Based Routing Amazon ECR Container Image

Page 54

High Scale Live Event Streaming Cloud With Raj www.cloudwithraj.com AWS Elemental MediaLive AWS Elemental MediaPackage CloudFront - Takes raw video input - Transcode into multiple bitrates - Packages video in multiple streaming formats to support various devices - Global distribution of the live feed Devices Live Cameras in Stadium S3 - Save for Video on Demand

Page 55

Dynamic Ad Insertion in Live Stream Cloud With Raj www.cloudwithraj.com AWS Elemental MediaLive AWS Elemental MediaPackage CloudFront - Takes raw video input - Transcode into multiple bitrates - Packages video in multiple streaming formats to support various devices Insert Ads Devices Live Cameras in Stadium AWS Elemental MediaTailor 1 2 EC2 Running Ad Decision Server (ADS) 3 4 5 6 ad markers ad A ad B A B - Global distribution of the live feed 1. CloudFront requests manifest (video clip with ads) with viewer information which is used for ad personalization 2. MediaTailor gets manifest i.e. video clip with ad markers (which time frame to insert the ads). At this point NO ads have been inserted 3. MediaTailor requests personalized ads based on viewer information from ADS (Ad Decision Server). ADS is a third party software, that can run on EC2 4. ADS running on EC2 return ad(s) to the MediaTailor 5. Mediatailor inserts these ad(s) into the manifest at the ad markers 6. MediaTailor returns this video clip with ads or manifest with ads to the CloudFront. The clip with ads are shown to the viewers consuming from the CloudFront